Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

نویسندگان

چکیده

Human-Object Interaction (HOI) detection is an essential task to understand human-centric images from a fine-grained perspective. Although end-to-end HOI models thrive, their paradigm of parallel human/object and verb class prediction loses two-stage methods' merit: object-guided hierarchy. The object in one triplet gives direct clues the be predicted. In this paper, we aim boost with statistical priors. Specifically, We propose utilize Verb Semantic Model (VSM) use semantic aggregation profit Similarity KL (SKL) loss proposed optimize VSM align dataset's To overcome static embedding problem, generate cross-modality-aware visual features by Cross-Modal Calibration (CMC). above modules combined composes Object-guided Cross-modal Network (OCN). Experiments conducted on two popular benchmarks demonstrate significance incorporating prior knowledge produce state-of-the-art performances. More detailed analysis indicates serve as stronger predictor more superior method utilizing knowledge. codes are available at https://github.com/JacobYuan7/OCN-HOI-Benchmark.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting and Recognizing Human-Object Interactions

To understand the visual world, a machine must not only recognize individual object instances but also how they interact. Humans are often at the center of such interactions and detecting human-object interactions is an important practical and scientific problem. In this paper, we address the task of detecting 〈human, verb, object〉 triplets in challenging everyday photos. We propose a novel mod...

متن کامل

Cross-Modal Object Recognition Is Viewpoint-Independent

BACKGROUND Previous research suggests that visual and haptic object recognition are viewpoint-dependent both within- and cross-modally. However, this conclusion may not be generally valid as it was reached using objects oriented along their extended y-axis, resulting in differential surface processing in vision and touch. In the present study, we removed this differential by presenting objects ...

متن کامل

Human concerned object detecting in video

The purpose of our work is to detect the target human concerned in video. For security considerations, event detection in video has potential economic and social needs. Human concerned object detecting is very helpful for event detection. In some emergency or special events, people will focus on specific object. We need locate human body and face, detect the sight direction, and determine the o...

متن کامل

Modal Object Diagrams

While object diagrams (ODs) are widely used as a means to document object-oriented systems, they are expressively weak, as they are limited to describe specific possible snapshots of the system at hand. In this paper we introduce modal object diagrams (MODs), which extend the classical OD language with positive/negative and example/invariant modalities. The extended language allows the designer...

متن کامل

Declarative Semantics in Object-Oriented Software Development - A Taxonomy and Survey

One of the modern paradigms to develop an application is object oriented analysis and design. In this paradigm, there are several objects and each object plays some specific roles in applications. In an application, we must distinguish between procedural semantics and declarative semantics for their implementation in a specific programming language. For the procedural semantics, we can write a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i3.20229